143 research outputs found

    Scale Invariant Interest Points with Shearlets

    Full text link
    Shearlets are a relatively new directional multi-scale framework for signal analysis, which have been shown effective to enhance signal discontinuities such as edges and corners at multiple scales. In this work we address the problem of detecting and describing blob-like features in the shearlets framework. We derive a measure which is very effective for blob detection and closely related to the Laplacian of Gaussian. We demonstrate the measure satisfies the perfect scale invariance property in the continuous case. In the discrete setting, we derive algorithms for blob detection and keypoint description. Finally, we provide qualitative justifications of our findings as well as a quantitative evaluation on benchmark data. We also report an experimental evidence that our method is very suitable to deal with compressed and noisy images, thanks to the sparsity property of shearlets

    Space-Time Signal Analysis and the 3D Shearlet Transform

    Get PDF
    In this work, we address the problem of analyzing video sequences by representing meaningful local space\ue2\u80\u93time neighborhoods. We propose a mathematical model to describe relevant points as local singularities of a 3D signal, and we show that these local patterns can be nicely highlighted by the 3D shearlet transform, which is at the root of our work. Based on this mathematical framework, we derive an algorithm to represent space\ue2\u80\u93time points which is very effective in analyzing video sequences. In particular, we show how points of the same nature have a very similar representation, allowing us to compute different space\ue2\u80\u93time primitives for a video sequence in an unsupervised way

    Detecting spatio-temporally interest points using the shearlet transform

    Get PDF
    In this paper we address the problem of detecting spatio-temporal interest points in video sequences and we introduce a novel detection algorithm based on the three-dimensional shearlet transform. By evaluating our method on different application scenarios, we show we are able to extract meaningful spatio-temporal features from video sequences of human movements, including full body movements selected from benchmark datasets of human actions and human-machine interaction sequences where the goal is to segment drawing activities in smaller action primitives

    Adaptive Body Gesture Representation for Automatic Emotion Recognition

    Get PDF
    We present a computational model and a system for the automated recognition of emotions starting from full-body movement. Three-dimensional motion data of full-body movements are obtained either from professional optical motion-capture systems (Qualisys) or from low-cost RGB-D sensors (Kinect and Kinect2). A number of features are then automatically extracted at different levels, from kinematics of a single joint to more global expressive features inspired by psychology and humanistic theories (e.g., contraction index, fluidity, and impulsiveness). An abstraction layer based on dictionary learning further processes these movement features to increase the model generality and to deal with intraclass variability, noise, and incomplete information characterizing emotion expression in human movement. The resulting feature vector is the input for a classifier performing real-time automatic emotion recognition based on linear support vector machines. The recognition performance of the proposed model is presented and discussed, including the tradeoff between precision of the tracking measures (we compare the Kinect RGB-D sensor and the Qualisys motion-capture system) versus dimension of the training dataset. The resulting model and system have been successfully applied in the development of serious games for helping autistic children learn to recognize and express emotions by means of their full-body movement

    View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

    Full text link
    For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, training monocular methods requires a vast amount of annotated data. Interestingly, self-supervised approaches have recently been successfully applied to ease the training process and unlock access to widely available unlabelled data. While related research leverages different priors including LIDAR scans and stereo images, such priors again limit usability. Therefore, in this work, we propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone, leveraging multi-view constraints and weak labels. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods using LIDAR scans or stereo images
    corecore